Scaling - Up or Out

نویسندگان

  • Al Talkington
  • Kaivalya Dixit
چکیده

Insatiable demands for performance and the availability of inexpensive microprocessors have led to the development of multiprocessor-based systems. Passionate debates amongst architects, designers, and vendors continue to rage regarding cost effective multiprocessor design alternatives. Hypes and claims of performance and scalability are confusing and often misleading. Increasing performance and reliability by replication of subsystems and/or full systems are old concepts. Even the definition of what constitutes a “large” number of processors has changed over time, and will continue to evolve. This paper addresses performance scalability issues on Uniform Memory Access (UMA) Symmetric Multiprocessor (SMP) systems up to 128 processors. Executive Summary Scalability is significantly influenced by the hardware configuration, software configuration, and workload. There are a very large number of workloads and their characteristics are continually changing as new technologies and applications are developed. Scalable systems (Scale Up or Scale Out) should scale on more than one application or benchmark. Increasing performance by adding more processors has come to be commonly referred to as “scaling up”. Increasing performance by adding additional complete systems is referred to as “scaling out”. At the hardware level, SMP systems replicate processors and caches , share global memory and IO, and also share the connections between these devices. Our observations will show that efficiency, scalability, reliability, and cost effectiveness of Scale Up systems degrade beyond 32 modern high performance microprocessors. The primary reasons for this degradation are bottlenecks caused by the ever widening gap between processor and memory speeds and contention of shared resources (e.g., bandwidth, memory, operating system). Clusters, or Scale Out systems, replicate complete systems interconnected via a variety of interconnect mechanisms. In Scale Out systems, the performance and scalability are limited by both the speed and efficiency of the intra-node communication and workload management. For the fast changing computing dynamics and long term investment in large enterprise systems, it is imperative to configure a judicious combination of efficient Scale Up and easy to manage Scale Out systems to solve both current and future computing needs of customers. Introduction The evolution of processor technologies and architectures have validated Moore’s law of doubling the CPU clock rates every12 to18 months for the last 20 years. The speed of DRAMs are following a very different pace. Memory technology development has focused on increasing the density and reducing the cost of memory. Since 1997 CPU clock rates have jumped nearly an order of magnitude (from 300 MHz to 2000 MHz). During this same period memory chip speeds have only managed to double (from 100 MHz to 266 MHz (DDR)). Interconnect (CPU ↔ CPU and CPU ↔ Memory) speeds and interconnect bandwidth have also followed a much slower growth rate than the processor clock speed. As a result, the clock rate gap between the CPU and the main memory is increasing at a rapid rate. Computer architects, memory system designers, and computer system designers have been exploiting a myriad of techniques and exotic mechanisms (i.e. super-scaling, pipe-lining, adding multiple levels of cache, additional load/store units, speculative scheduling, hardware multithreading (HMT), separate and wide memory buses, double DRAM, and exotic interconnect technologies) to keep the CPU performing useful work during long waits (many CPU cycles) for data from memory. IBM ~ Performance Technical Report Scaling Up or Out Page 2 At the same time customers have been given the perception that a high clock-rate means proportionately higher performance. So the customer expects a 1000 MHz system to deliver 2X the performance of a 500 MHz system. Unfortunately, this is simply not the case in most real world environments. In fact, processor clock rate improvements may even reduce a specific application’s performance, depending on the demands placed on the other system resources. To meet the performance demands of modern enterprise systems, vendors are offering a variety of multiprocessor based systems. Entry, midrange, and some high-end systems are frequently offered as shared memory based systems, also known as Scale Up systems. To attain higher performance and ease the burden on programming these systems are typically designed as Symmetric Multiprocessors (SMPs), such that all CPUs uniformly access memory and other system resources (I/O, disk, etc) through a bus, crossbar switch, or backplane. This means that only the CPU and caches are replicated. Other resources including the interconnect mechanism, main memory, and operating system, are shared amongst the processors. On the workloads that fit in caches and require little or no access to memory or operating system resources the user may attain almost linearly scaled performance increases as processors are added. Real applications are rarely that simple. By definition sharing means contention. And contention means sub-linear scalability and decrease in performance. An alternate architecture that can add performance is “Share Nothing Systems”, also known as Clusters or Scale Out systems. The Clusters are made up of interconnected nodes, which can be either uniprocessor or SMPs. For some workloads, a cluster may be a cost effective approach to scale on a benchmark or an application. Given a workload that can be partitioned into multiple nodes, the trick to optimizing performance in this environment is in the workload balancing and system management. Balancing and managing clusters was very much an art rather than an exact science but significant progress has been made in useable tools. With clusters you may or may not attain higher performance on a given application but you may attain better reliability with appropriate high availability software. Replication of the hardware and software eliminates many common points of failure. There is a spectrum of architectures that span between the classic SMP and Clustered systems. Broadly these include NUMA systems, Massively Parallel Processor systems (MPP), tightly coupled cluster systems, and highly available clustered systems. Further there are many variations around these basic approaches. All of these exploit features of the basic SMP/Clustered designs, while attempting to mitigate the limitations, to address specific needs in final solutions. Discussions in this paper are limited to UMA Shared Memory Processors systems. Hardware is not the only component to be considered in scaling. While to most buyers performance and scalability are mainly associated with clock rates, number of CPUs , and the number of disks, scaling is also largely defined by other system components and software performance. Obviously once an architecture is implemented in hardware, further scaling must be done by the software. This scaling is addressed by programming models, application tuning, operating system tuning, compiler libraries, database software tuning, and other techniques. While we focus on hardware scaling in this paper, it is impossible to completely eliminate the effects of the software. Scalability The simplest definition of "perfect scalability" is: No performance limitation due to hardware, software, or the size of computing problem. Unfortunately this is similar to flying at the speed of light. No one can get there. Scalability is basically a metric that indicates the performance benefits of multiple processor based Scale Up systems and Scale Out systems. While it is defined in many ways, it is not a precise metric and is often confused with speed, throughput, and system configuration. In general a good IBM ~ Performance Technical Report Scaling Up or Out Page 3 scalable system should scale the performance of your application in a predictable manner when you increase or decrease the configuration. There are numerous components involved in scalability but, broadly, processor architectures exploit instruction level parallelism while minimizing latency effects. Multiprocessors architectures exploit application and system software level parallelism while minimizing contention between resources. The combination is intended to improve both speedup and throughput. Scale Up and Scale Out systems provide scalability in different domains (workloads). Additionally, there are two distinct types of scalability issues : throughput and response time (how much work) and speedup (how much faster). Throughput and response time scalability is defined as the proportionate increase in throughput of work with a constant response time (e.g., number of transactions) by adding processors, disks and other devices that permit increasing the work that can be completed (e.g., TPC -C, TPC-H, SPECjbb ) within a specified time. This measure is mostly used for characterization of commercial and integer workloads. Scalability implies that the resultant improvement should be proportional to the increase in system resources. Ideally each added processor enables an equal amount of additional work to be completed in the same time. In the worst case, additional processors enable no additional work, or may even reduce the amount of work that can be completed. In reality, a system will usually fall into the “typical” area between these curves. Speed up scalability is defined as completing the same amount of work in less time by adding additional processors, disks, and other devices for a given problem size (e.g., Fluent, SPEComp). Again, scalability implies that the resultant improvement should be proportional to the increase in system resources. In an ideally scalable solution, the time required to complete a task decreases linearly as processors are added. Worst case scalability, of course, means that added resources do nothing to improve the elapsed time. In reality, a system will usually fall into the “typical” area between these curves. In fact in some cases the addition of resources may increase the elapsed time due to the lack of adequate parallelism in the workload. All vendors claim that their system is scalable and cite a benchmark result or an application result that shows almost an ideal (linear) scalability behavior. Many customers mistakenly equate SMP scalability with execution speed. In fact some popular benchmarks (i.e. Dhrystone and SPECint_rate) frequently do demonstrate almost linear scalability because the entire benchmark will fit in the cache of most modern systems. Unfortunately, the real world rarely fits in the cache. The footprints (working set size) of most applications, particularly commercial applications, are very large and expected increase over time. When the footprint of the program is larger than the cache, processors go to main memory to fetch data. Unless IBM ~ Performance Technical Report Scaling Up or Out Page 4 Resources W or k P er fo rm ed Throughput

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Progress in Global Surgery; Comment on “Global Surgery – Informing National Strategies for Scaling Up Surgery in Sub-Saharan Africa”

Impressive progress has been made in global surgery in the past 10 years, and now serious and evidence-based national strategies are being developed for scaling-up surgical services in sub-Saharan Africa. Key to achieving this goal requires developing a realistic country-based estimate of burden of surgical disease, developing an accurate estimate of existing need, deve...

متن کامل

A new 2D block ordering system for wavelet-based multi-resolution up-scaling

A complete and accurate analysis of the complex spatial structure of heterogeneous hydrocarbon reservoirs requires detailed geological models, i.e. fine resolution models. Due to the high computational cost of simulating such models, single resolution up-scaling techniques are commonly used to reduce the volume of the simulated models at the expense of losing the precision. Several multi-scale ...

متن کامل

مقایسه ریزنشت ترمیم های کلاس V کامپوزیت و گلاس آینومر پس از کاربرد دستگاه جرمگیری اولتراسونیک

Ultrasonic scaling of class V composite or glass Ionomer restorations may cause destructive effects. The purpose of this study is to investigate the effects of ultrasonic scaling on the marginal leakage of composite and Glass Ionomer restorations. An in-vitro study was performed on thirty- two upper and lower premolar teeth, free of any kind of caries, cracks or facets. Class V cavities, with t...

متن کامل

Scaling Up a Strengthened Youth-Friendly Service Delivery Model to Include Long-Acting Reversible Contraceptives in Ethiopia: A Mixed Methods Retrospective Assessment

Background Donor funded projects are small scale and time limited, with gains that soon dissipate when donor funds end. This paper presents findings that sought to understand successes, challenges and barriers that influence the scaling up and sustainability of a tested, strengthened youth-friendly service (YFS) delivery model providing an expanded contraceptive method choice in one locat...

متن کامل

Implicational Scaling of Reading Comprehension Construct: Is it Deterministic or Probabilistic?

In English as a Second Language Teaching and Testing situations, it is common to infer about learners’ reading ability based on his or her total score on a reading test. This assumes the unidimensional and reproducible nature of reading items. However, few researches have been conducted to probe the issue through psychometric analyses. In the present study, the IELTS exemplar module C (1994) wa...

متن کامل

Scaling up and out as a Pathway for Food System Transitions

This paper contributes to the understanding of sustainability transitions by analysing processes of scaling up and out as change pathway. It defines scaling up and out as a distinct form of policy transfer focused on programme implementation, with continuity of actors across jurisdictions. We detail how scaling up and out occurs, introducing a new mechanism to policy transfer frameworks. This i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002